Exploring and Predicting Severe Road Traffic Crashes with Machine Learning Models

Richard Wen
rwen@ryerson.ca

1.0 Setup

First, we need to import the required libraries.

In [154]:
import calendar
import cufflinks as cf
import plotly
import plotly.graph_objects as go
import pandas as pd

from pandas.api.types import is_string_dtype
from sklearn.ensemble import RandomForestClassifier

Then, we have to configure cufflinks and plotly for offline notebook plotting. We also set the cufflinks plot theme and the number of columns to display when we present tables.

In [174]:
cf.go_offline()
cf.set_config_file(theme='white')
plotly.offline.init_notebook_mode()
pd.set_option('display.max_columns', None)

Finally, we apply some re-usable global settings for our notebook:

  • ksi_data_url: link to the Toronto Public Safety Data Portal .csv Killed or Seriously Injured (KSI) data
  • map_style: map style for base maps (see plotly mapbox styles)
  • map_margins: margins for the map, we decrease this to give more space for the map (except for the top, which contains the title)
In [99]:
ksi_data_url = 'https://opendata.arcgis.com/datasets/88e8040e02d5493eb163e454140d3a34_0.csv?outSR=%7B%22latestWkid%22%3A3857%2C%22wkid%22%3A102100%7D'
map_style = 'carto-positron'
map_margins = {'l': 0, 'r': 0, 'b': 0, 't': 35}

2.0 Data

The data used in this experiment will involve the Killed or Seriously Injured (KSI) data available from the Toronto Police Public Safety Data Portal.

Here is a short summary from the glossary documentation of the data:

The Killed and Seriously Injured (KSI) data is a subset dataset from all traffic collision events. The source of the data comes from police reports where an officer attended an event related to a traffic collision.

Please note that this dataset does not include all traffic collision events. The KSI data only includes events where a person sustained a major or fatal injury in a traffic collision event.The following definitions relate to the severity of injury used to classify the events in this dataset.

  • Major Injury: A non-fatal injury that is severe enough to require the injured person to be admitted to hospital, even if only for observation at the time of the collision. Includes: fracture, internal injury, severe cuts, crushing, burns, concussion, severe general shocks.
  • Fatal: Fatal injury (person sustains bodily injuries resulting in death) only those cases where death occurs in less than 366 days as result of the collision. “Fatal” does not include death from natural causes (heart attack, stroke, epilated seizure, etc.) or suicide.
  • Note: Other injury types including minor or none are associated to every individual included in the event.The KSI data includes a record (row) for every person involved in the collision event regardless of their level of injury, it includes everyone who was involved in a particular collision event. The field “Index” provides an arbitrary unique identification for every record in the entire dataset.

The “ACCNUM” is a unique identification for each traffic collision event. Since the data includes every person involved in a collision event, this identification is duplicated. Please note that this number is not unique and it may repeat year over year. Careful consideration must be made when creating a subset for unique events, as the detailed information provided is for every person involved and its associated role and information may be lost.

For example, the event with ACCNUM=6000607400 has 5 persons involved in the collision (5 records). The field “INVTYPE” indicates the role of the person in the collision event. The “INVAGE” indicates the age range of the person and the “INJURY” type indicates the level of injury they sustained. Therefore, this event can be interpreted in the following way:1. Passenger 1 age 20 to 24 sustained a fatal injury.2. Passenger 2 age 15-19 sustained a fatal injury.3. Passenger 3 age 20 to 24 sustained a major injury4. Driver age 1 20 to 24 sustained a major injury.5. Driver 2 age 45 to 49 sustained a major injury

2.1 Reading the Data

First, we will try to download from a link ksi_data_url and read it, or use a saved copy of the data in our data folder data/ksi.csv if that does not work.

In [100]:
try:
    ksi = pd.read_csv(ksi_data_url)
except:
    ksi = pd.read_csv('data/ksi.csv')
ksi
Out[100]:
X Y Index_ ACCNUM YEAR DATE TIME Hour STREET1 STREET2 OFFSET ROAD_CLASS District WardNum WardNum_X WardNum_Y Division Division_X Division_Y LATITUDE LONGITUDE LOCCOORD ACCLOC TRAFFCTL VISIBILITY LIGHT RDSFCOND ACCLASS IMPACTYPE INVTYPE INVAGE INJURY FATAL_NO INITDIR VEHTYPE MANOEUVER DRIVACT DRIVCOND PEDTYPE PEDACT PEDCOND CYCLISTYPE CYCACT CYCCOND PEDESTRIAN CYCLIST AUTOMOBILE MOTORCYCLE TRUCK TRSN_CITY_ EMERG_VEH PASSENGER SPEEDING AG_DRIV REDLIGHT ALCOHOL DISABILITY Hood_ID Neighbourh ObjectId
0 -79.412438 43.767462 80221198 4003162994 2014 2014-10-24T04:00:00.000Z 2315 23 YONGE ST HILLCREST AVE Major Arterial North York 18 0 0 32 0 0 43.767462 -79.412438 Intersection Non Intersection No Control Clear Dark, artificial Dry Non-Fatal Injury Sideswipe Passenger unknown None 0 Yes Yes Yes 51 Willowdale East (51) 12001
1 -79.516246 43.718318 80565670 6001093797 2016 2016-06-22T04:00:00.000Z 2315 23 120 BEVERLY HILLS DR 65 m South of Collector Etobicoke York 7 0 0 31 0 0 43.718318 -79.516246 Mid-Block Non Intersection No Control Clear Dark, artificial Dry Non-Fatal Injury Pedestrian Collisions Driver 25 to 29 None 0 South Automobile, Station Wagon Going Ahead Driving Properly Normal Yes Yes Yes 26 Downsview-Roding-CFB (26) 12002
2 -79.516246 43.718318 80565671 6001093797 2016 2016-06-22T04:00:00.000Z 2315 23 120 BEVERLY HILLS DR 65 m South of Collector Etobicoke York 7 0 0 31 0 0 43.718318 -79.516246 Mid-Block Non Intersection No Control Clear Dark, artificial Dry Non-Fatal Injury Pedestrian Collisions Passenger 30 to 34 None 0 Yes Yes Yes 26 Downsview-Roding-CFB (26) 12003
3 -79.516246 43.718318 80565672 6001093797 2016 2016-06-22T04:00:00.000Z 2315 23 120 BEVERLY HILLS DR 65 m South of Collector Etobicoke York 7 0 0 31 0 0 43.718318 -79.516246 Mid-Block Non Intersection No Control Clear Dark, artificial Dry Non-Fatal Injury Pedestrian Collisions Pedestrian 10 to 14 Major 0 East Vehicle hits the pedestrian walking or running... Crossing, no Traffic Control Inattentive Yes Yes Yes 26 Downsview-Roding-CFB (26) 12004
4 -79.374309 43.662909 80632379 6002153175 2016 2016-12-04T05:00:00.000Z 2315 23 CARLTON STREET HOMEWOOD AVENUE Minor Arterial Toronto and East York 13 0 0 51 0 0 43.662909 -79.374309 Intersection At Intersection No Control Rain Dark, artificial Wet Non-Fatal Injury Pedestrian Collisions Driver 75 to 79 None 0 East Automobile, Station Wagon Turning Left Failed to Yield Right of Way Inattentive Yes Yes Yes 73 Moss Park (73) 12005
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
12239 -79.330990 43.804445 5273577 1119725 2009 2009-07-26T04:00:00.000Z 236 2 3330 PHARMACY Aven Minor Arterial Scarborough 22 0 0 42 0 0 43.804445 -79.330990 No Control Clear Dark Dry Non-Fatal Injury Pedestrian Collisions Driver unknown None 0 South Automobile, Station Wagon Going Ahead Yes Yes 116 Steeles (116) 996
12240 -79.330990 43.804445 5273578 1119725 2009 2009-07-26T04:00:00.000Z 236 2 3330 PHARMACY Aven Minor Arterial Scarborough 22 0 0 42 0 0 43.804445 -79.330990 No Control Clear Dark Dry Non-Fatal Injury Pedestrian Collisions Pedestrian unknown Major 0 Other Other Yes Yes 116 Steeles (116) 997
12241 -79.330990 43.804445 5273579 1119725 2009 2009-07-26T04:00:00.000Z 236 2 3330 PHARMACY Aven Minor Arterial Scarborough 22 0 0 42 0 0 43.804445 -79.330990 No Control Clear Dark Dry Non-Fatal Injury Pedestrian Collisions Pedestrian unknown Minimal 0 Other Other Yes Yes 116 Steeles (116) 998
12242 -79.228359 43.791693 80205836 4001787575 2014 2014-03-29T04:00:00.000Z 236 2 455 MILNER AVE Minor Arterial Scarborough 23 0 0 42 0 0 43.791693 -79.228359 Mid-Block Non Intersection No Control Clear Dark, artificial Dry Non-Fatal Injury SMV Other Driver 25 to 29 Major 0 East Delivery Van Going Ahead Lost control Ability Impaired, Alcohol Yes Yes Yes 132 Malvern (132) 999
12243 -79.524242 43.755858 80927971 7003085452 2017 2017-11-28T05:00:00.000Z 558 5 FINCH AVE W NORFINCH DR Major Arterial Etobicoke York 7 0 0 31 0 0 43.755858 -79.524242 Intersection At Intersection Traffic Signal Clear Dark, artificial Dry Non-Fatal Injury Turning Movement Driver 20 to 24 None 0 West Automobile, Station Wagon Going Ahead Driving Properly Normal Yes Yes Yes 25 Glenfield-Jane Heights (25) 1000

12244 rows × 60 columns

Notice here that there a few columns with 'Yes' values in them, and that the DATE column is in text only. We will need to preprocess these for our analyses.

2.2 Date Conversion

The date is in ISO 8601 UTC format, but needs to be converted into a date time object using pd.to_datetime.

In [101]:
ksi.DATE = pd.to_datetime(ksi.DATE)
ksi.DATE
Out[101]:
0       2014-10-24 04:00:00+00:00
1       2016-06-22 04:00:00+00:00
2       2016-06-22 04:00:00+00:00
3       2016-06-22 04:00:00+00:00
4       2016-12-04 05:00:00+00:00
                   ...           
12239   2009-07-26 04:00:00+00:00
12240   2009-07-26 04:00:00+00:00
12241   2009-07-26 04:00:00+00:00
12242   2014-03-29 04:00:00+00:00
12243   2017-11-28 05:00:00+00:00
Name: DATE, Length: 12244, dtype: datetime64[ns, UTC]

2.3 Dummy Coding

Next we should convert some of the variables with Yes to 1 (Yes) and 0 (No) so we can obtain counts for each variable, and perform numerical computation.

In [102]:
# Get the columns with 'Yes' in it
ksi_str_columns = [c for c in ksi.columns if is_string_dtype(ksi[c])]
ksi_yes_columns = [c for c in ksi_str_columns if 'Yes' in ksi[c].values]

# If there are any 'Yes' columns, convert them to 1 and 0
if len(ksi_yes_columns) > 0:
    ksi[ksi_yes_columns] = ksi[ksi_yes_columns].apply(lambda c: [1 if r == 'Yes' else 0 for r in c])
ksi
Out[102]:
X Y Index_ ACCNUM YEAR DATE TIME Hour STREET1 STREET2 OFFSET ROAD_CLASS District WardNum WardNum_X WardNum_Y Division Division_X Division_Y LATITUDE LONGITUDE LOCCOORD ACCLOC TRAFFCTL VISIBILITY LIGHT RDSFCOND ACCLASS IMPACTYPE INVTYPE INVAGE INJURY FATAL_NO INITDIR VEHTYPE MANOEUVER DRIVACT DRIVCOND PEDTYPE PEDACT PEDCOND CYCLISTYPE CYCACT CYCCOND PEDESTRIAN CYCLIST AUTOMOBILE MOTORCYCLE TRUCK TRSN_CITY_ EMERG_VEH PASSENGER SPEEDING AG_DRIV REDLIGHT ALCOHOL DISABILITY Hood_ID Neighbourh ObjectId
0 -79.412438 43.767462 80221198 4003162994 2014 2014-10-24 04:00:00+00:00 2315 23 YONGE ST HILLCREST AVE Major Arterial North York 18 0 0 32 0 0 43.767462 -79.412438 Intersection Non Intersection No Control Clear Dark, artificial Dry Non-Fatal Injury Sideswipe Passenger unknown None 0 0 0 1 0 0 1 0 1 0 0 0 0 0 51 Willowdale East (51) 12001
1 -79.516246 43.718318 80565670 6001093797 2016 2016-06-22 04:00:00+00:00 2315 23 120 BEVERLY HILLS DR 65 m South of Collector Etobicoke York 7 0 0 31 0 0 43.718318 -79.516246 Mid-Block Non Intersection No Control Clear Dark, artificial Dry Non-Fatal Injury Pedestrian Collisions Driver 25 to 29 None 0 South Automobile, Station Wagon Going Ahead Driving Properly Normal 1 0 1 0 0 0 0 1 0 0 0 0 0 26 Downsview-Roding-CFB (26) 12002
2 -79.516246 43.718318 80565671 6001093797 2016 2016-06-22 04:00:00+00:00 2315 23 120 BEVERLY HILLS DR 65 m South of Collector Etobicoke York 7 0 0 31 0 0 43.718318 -79.516246 Mid-Block Non Intersection No Control Clear Dark, artificial Dry Non-Fatal Injury Pedestrian Collisions Passenger 30 to 34 None 0 1 0 1 0 0 0 0 1 0 0 0 0 0 26 Downsview-Roding-CFB (26) 12003
3 -79.516246 43.718318 80565672 6001093797 2016 2016-06-22 04:00:00+00:00 2315 23 120 BEVERLY HILLS DR 65 m South of Collector Etobicoke York 7 0 0 31 0 0 43.718318 -79.516246 Mid-Block Non Intersection No Control Clear Dark, artificial Dry Non-Fatal Injury Pedestrian Collisions Pedestrian 10 to 14 Major 0 East Vehicle hits the pedestrian walking or running... Crossing, no Traffic Control Inattentive 1 0 1 0 0 0 0 1 0 0 0 0 0 26 Downsview-Roding-CFB (26) 12004
4 -79.374309 43.662909 80632379 6002153175 2016 2016-12-04 05:00:00+00:00 2315 23 CARLTON STREET HOMEWOOD AVENUE Minor Arterial Toronto and East York 13 0 0 51 0 0 43.662909 -79.374309 Intersection At Intersection No Control Rain Dark, artificial Wet Non-Fatal Injury Pedestrian Collisions Driver 75 to 79 None 0 East Automobile, Station Wagon Turning Left Failed to Yield Right of Way Inattentive 1 0 1 0 0 0 0 0 0 1 0 0 0 73 Moss Park (73) 12005
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
12239 -79.330990 43.804445 5273577 1119725 2009 2009-07-26 04:00:00+00:00 236 2 3330 PHARMACY Aven Minor Arterial Scarborough 22 0 0 42 0 0 43.804445 -79.330990 No Control Clear Dark Dry Non-Fatal Injury Pedestrian Collisions Driver unknown None 0 South Automobile, Station Wagon Going Ahead 1 0 1 0 0 0 0 0 0 0 0 0 0 116 Steeles (116) 996
12240 -79.330990 43.804445 5273578 1119725 2009 2009-07-26 04:00:00+00:00 236 2 3330 PHARMACY Aven Minor Arterial Scarborough 22 0 0 42 0 0 43.804445 -79.330990 No Control Clear Dark Dry Non-Fatal Injury Pedestrian Collisions Pedestrian unknown Major 0 Other Other 1 0 1 0 0 0 0 0 0 0 0 0 0 116 Steeles (116) 997
12241 -79.330990 43.804445 5273579 1119725 2009 2009-07-26 04:00:00+00:00 236 2 3330 PHARMACY Aven Minor Arterial Scarborough 22 0 0 42 0 0 43.804445 -79.330990 No Control Clear Dark Dry Non-Fatal Injury Pedestrian Collisions Pedestrian unknown Minimal 0 Other Other 1 0 1 0 0 0 0 0 0 0 0 0 0 116 Steeles (116) 998
12242 -79.228359 43.791693 80205836 4001787575 2014 2014-03-29 04:00:00+00:00 236 2 455 MILNER AVE Minor Arterial Scarborough 23 0 0 42 0 0 43.791693 -79.228359 Mid-Block Non Intersection No Control Clear Dark, artificial Dry Non-Fatal Injury SMV Other Driver 25 to 29 Major 0 East Delivery Van Going Ahead Lost control Ability Impaired, Alcohol 0 0 1 0 0 0 0 1 0 0 0 1 0 132 Malvern (132) 999
12243 -79.524242 43.755858 80927971 7003085452 2017 2017-11-28 05:00:00+00:00 558 5 FINCH AVE W NORFINCH DR Major Arterial Etobicoke York 7 0 0 31 0 0 43.755858 -79.524242 Intersection At Intersection Traffic Signal Clear Dark, artificial Dry Non-Fatal Injury Turning Movement Driver 20 to 24 None 0 West Automobile, Station Wagon Going Ahead Driving Properly Normal 0 0 1 0 0 0 0 1 0 1 0 0 0 25 Glenfield-Jane Heights (25) 1000

12244 rows × 60 columns

3.0 Data Exploration

3.1 Variable Statistics

First, we explore some of the general statistics for each variable or column.

In [103]:
# Initial summary stats
ksi_summary = ksi.describe()

# Get the sum for all summary columns
ksi_sum = ksi[ksi_summary.columns].sum()
ksi_sum = ksi_sum.rename('sum')

# Add the sum to the summary stats
ksi_summary = ksi_summary.append(ksi_sum)
ksi_summary
Out[103]:
X Y Index_ ACCNUM YEAR TIME Hour WardNum WardNum_X WardNum_Y Division_X Division_Y LATITUDE LONGITUDE FATAL_NO PEDESTRIAN CYCLIST AUTOMOBILE MOTORCYCLE TRUCK TRSN_CITY_ EMERG_VEH PASSENGER SPEEDING AG_DRIV REDLIGHT ALCOHOL DISABILITY Hood_ID ObjectId
count 12244.000000 12244.000000 1.224400e+04 1.224400e+04 1.224400e+04 1.224400e+04 12244.000000 1.224400e+04 12244.000000 12244.000000 12244.000000 12244.000000 12244.000000 12244.000000 12244.000000 12244.000000 12244.000000 12244.000000 12244.000000 12244.000000 12244.000000 12244.000000 12244.000000 12244.000000 12244.000000 12244.000000 12244.000000 12244.000000 12244.000000 1.224400e+04
mean -79.396212 43.710748 3.587528e+07 2.370242e+09 2.012689e+03 1.352408e+03 13.243711 3.506657e+03 2.047615 1.729173 4.121937 3.339268 43.710748 -79.396212 1.471905 0.408200 0.109523 0.902728 0.085920 0.058968 0.066808 0.002123 0.367037 0.166204 0.516498 0.078324 0.039775 0.027769 73.352499 6.122500e+03
std 0.103606 0.056192 3.625811e+07 3.074230e+09 3.136108e+00 6.249500e+02 6.257227 2.194788e+05 5.642203 4.947143 13.208229 11.146807 0.056192 0.103606 7.595429 0.491521 0.312307 0.296340 0.280257 0.235574 0.249700 0.046034 0.482016 0.372279 0.499748 0.268692 0.195437 0.164316 41.372891 3.534683e+03
min -79.638390 43.592047 0.000000e+00 1.284070e+05 2.008000e+03 0.000000e+00 0.000000 0.000000e+00 0.000000 0.000000 0.000000 0.000000 43.592047 -79.638390 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000e+00
25% -79.468615 43.662445 6.176591e+06 1.180965e+06 2.010000e+03 9.200000e+02 9.000000 7.000000e+00 0.000000 0.000000 0.000000 0.000000 43.662445 -79.468615 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 38.000000 3.061750e+03
50% -79.397290 43.702246 7.559770e+06 1.335254e+06 2.012000e+03 1.440000e+03 14.000000 1.300000e+01 0.000000 0.000000 0.000000 0.000000 43.702246 -79.397290 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 77.000000 6.122500e+03
75% -79.319248 43.756827 8.054227e+07 5.002033e+09 2.015000e+03 1.838000e+03 18.000000 2.200000e+01 0.000000 0.000000 0.000000 0.000000 43.756827 -79.319248 0.000000 1.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 1.000000 0.000000 0.000000 0.000000 111.000000 9.183250e+03
max -79.125896 43.855445 8.109988e+07 8.008069e+09 2.018000e+03 2.359000e+03 23.000000 1.716222e+07 25.000000 24.000000 55.000000 55.000000 43.855445 -79.125896 78.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 140.000000 1.224400e+04
sum -972127.222223 535194.398736 4.392569e+11 2.902125e+13 2.464336e+07 1.655888e+07 162156.000000 4.293551e+07 25071.000000 21172.000000 50469.000000 40886.000000 535194.398736 -972127.222223 18022.000000 4998.000000 1341.000000 11053.000000 1052.000000 722.000000 818.000000 26.000000 4494.000000 2035.000000 6324.000000 959.000000 487.000000 340.000000 898128.000000 7.496389e+07

Here are a few things to note:

  • There are 12244 KSI collisions
  • The data starts from 2008 to 2018 indicated by the YEAR
  • Times TIME are from 0 to 2359 (24-hour clock) similar to the Hour (0 to 23)
  • X and Y and LONGITUDE and LATITUDE are the coordinates of the collisions inside Toronto
  • The sums for PEDESTRIAN CYCLIST AUTOMOBILE MOTORCYCLE are interesting as they define the type of collision based on the method of transporation

With the information above, we can start with plotting the number of collisions per year.

In [104]:
group_by_year = pd.Grouper(key = 'DATE', freq = 'Y')
ksi_yearly = ksi.groupby(group_by_year).DATE.count()
ksi_yearly.iplot(title = 'KSI Collisions Per Year')

The plot above shows us that there was a noticeably sharp decline in the number of KSI collisions from 2013 to 2015, while the collisions seem to slowly rise again afterwards. There is also a sharp increase from 2012 to 2013.

I wonder if there are some government road safety, vehicle, or driver behaviours in Toronto that are causing these sharp increases and decreases.

3.2 Collision Types

Next let's look at the distribution of collisions by transportation type or vehicular involvement.

In [105]:
# Get the collision types defined by the method of transportation or vehicular involvement
ksi_types_columns = ['PEDESTRIAN', 'CYCLIST', 'AUTOMOBILE', 'MOTORCYCLE', 'TRUCK', 'TRSN_CITY_', 'EMERG_VEH']
ksi_types_columns = [c for c in ksi_types_columns if c in ksi.columns]

# Get the sums for the columns
ksi_types = pd.DataFrame({
    'collision_type': ksi_types_columns,
    'collisions': ksi_summary.loc['sum', ksi_types_columns]
})
ksi_types = ksi_types.sort_values(by = 'collisions', ascending = False)

# Plot the number of collisions per type
ksi_types.iplot(kind = 'bar', x = 'collision_type', y = 'collisions', title = 'KSI Collisions By Type')

We can see that most collisions involve automobiles, followed by pedestrians, while other types of collisions occur much less frequently.

3.2.1 Collision Type By Year

Let's also check them by year to see the changes over time.

In [106]:
ksi_types_year = ksi.groupby('YEAR')[ksi_types_columns].sum()
ksi_types_year.iplot(title = 'KSI Collisions Per Year By Type')

Looks like automobile and pedestrian collisions still account for most of the collisions every year, while the other types remain relatively stable.

Note: We also have to keep in mind that automobiles will likely be involved in most if not all severe collisions involving injury (due to it being the most available and used mode of transportation), which means that pedestrians are involved in a relatively large portion of severe collisions (being more vulnerable on the road).

3.2.2 Collision Type By Month

There might be a monthly pattern that can be seen for each collision type. We can quickly check this by plotting the number of collisions per month for each type.

In [168]:
# Aggregate the KSI data by month
ksi_month = ksi[ksi_types_columns]
ksi_month['MONTH'] = ksi.DATE.dt.month
ksi_month = ksi_month.groupby('MONTH').sum()

# Sort values by month
ksi_month = ksi_month.sort_values(by = 'MONTH')
ksi_month_index = ksi_month.index.to_series().apply(lambda x: calendar.month_abbr[x])
ksi_month = ksi_month.set_index(ksi_month_index)

# Plot the collision type by month
ksi_month.iplot(
    title = 'KSI Collisions By Type and Month',
    subplots = True,
    subplot_titles = True
)

A few notes for the plots above:

  • PEDESTRIAN collisions occur throughout all months with some slightly higher frequencies from September to October, and May to June
  • CYCLIST collisions generally occur after April and start to lower after October (maybe due to winter season)
  • AUTOMOBILE collisions are generally stable, but there is a slight rise from April onwards
  • MOTORCYCLE collisions seem to occur between April and October mostly, where collisions start to increase after March, and lower after October (likely due to season and weather conditions on the road)
  • TRUCK collisions fluctuate with the months of March, August, September, and October being relatively higher than the rest
  • TRSN_CITY_ or city transportation vehicle collisions fluctuate similarly to TRUCK collisions, where higher frequencies are in January, June, July, and August
  • EMERG_VEH or emergency vehicle collisions are relative rare and do not occur in January to March, May to June, and in November (make sense since all road users and traffic lights have to make way for emergency vehicles)
  • CYCLIST and AUTOMOBILE collisions seem to have similar patterns monthly, where collisions are high starting in May and and then dropping off near October

3.2.3 Collision Type By Day of Week

Finally, there might be particular days of the week where certain types of collisions occur more.

In [173]:
# Aggregate the KSI data by day of week
ksi_day = ksi[ksi_types_columns]
ksi_day['DAY'] = ksi.DATE.dt.weekday
ksi_day = ksi_day.groupby('DAY').sum()

# Sort values by day
ksi_day = ksi_day.sort_values(by = 'DAY')
ksi_day_index = ksi_day.index.to_series().apply(lambda x: calendar.day_abbr[x])
ksi_day = ksi_day.set_index(ksi_day_index)

# Plot the collision type by day of week
ksi_day.iplot(
    title = 'KSI Collisions By Type and Day of Week',
    subplots = True,
    subplot_titles = True
)

There are a few interesting things to note here:

  • PEDESTRIAN, TRUCK and CYCLIST collisions have some similar trends throughout the weekdays, although they differ monthly as seen in the previous section
  • CYCLIST and TRSN_CITY_ public transporation collisions have a very similar trend (are cyclists getting hit by city vehicles?)
  • The weekends (Sat and Sun) have less collisions in general except for EMERG_VEH collisions, which only have a total 6 on Sunday from 2008 to 2018
  • Thursday Thu and Friday Fri have relatively higher number of PEDESTRIAN and CYCLIST collisions
  • Fridays Fri generally have a high number of collisions for all types except EMERG_VEH
  • There is a noticeably higher number of motorcycle collisions on saturday Sat (maybe most motorcyclists like riding or are unlucky on Saturdays?)

3.1 Density Mapping

Next, we map the data using a density heat map with the LONGITUDE and LATITUDE coordinates to get an idea of what the spatial distribution is like.

In [80]:
# Get the latitude and longitude min/max ranges
ksi_lon_min, ksi_lon_max = ksi.describe().LONGITUDE[['min', 'max']]
ksi_lat_min, ksi_lat_max = ksi.describe().LATITUDE[['min', 'max']]

# Calculate the latitude and longitude mid points
ksi_lat_mid = (ksi_lat_max + ksi_lat_min) / 2
ksi_lon_mid = (ksi_lon_max + ksi_lon_min) / 2

# Create the map
ksi_map = go.Figure(go.Densitymapbox(
    lat=ksi.LATITUDE,
    lon=ksi.LONGITUDE,
    text = ksi,
    radius = 3
))
ksi_map.update_layout(margin = map_margins)

# Update the map style and view positions
ksi_map.update_layout(
    title = 'KSI Collision Density, Toronto, ON',
    mapbox_style = map_style,
    mapbox_center_lat = ksi_lat_mid,
    mapbox_center_lon = ksi_lon_mid,
    mapbox_zoom = 10
)
ksi_map.show()

We can see that there are more traffic accidents near the center and north-western portions of downtown Toronto. Collisions are then sparsely distributed in areas outside of the downtown core.

4.0 Model

Build multi-output models for traffic crash coordinates.

In [9]:
forest = RandomForestClassifier(n_estimators=100, random_state=1)